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: Abstract 

O 

■ In this paper, we study a hierarchical SSOR (HSSOR) method which could be used as a standalone method 

^1 or as a smoother for a two-grid method. It is found that the method leads to faster convergence compared 

to more costly incomplete LU (ILU(O)) with no fill-in, the SSOR, and the Block SSOR method. Moreover, 
Q> I for a two-grid method, numerical experiments suggests that HSSOR can be a better replacement for SSOR 

smoother both having no storage requirements and have no construction costs. Using Fourier analysis, ex- 
\ pressions for the eigenvalues and the condition number of HSSOR preconditioned problem is derived for the 

/j^ . three-dimensional isotropic model problem. 

1^ 
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> ■ 1 Introduction 

^: 

0^ I Fourier analysis has been an indispensable tool in understanding existing algorithms and designing newer 

algorithms in computational science. Similar techniques have been used extensively for analyzing the iterative 
' methods for solving linear systems of the form, 

CN ■ Ax = h (1) 



X: 



which lies at the heart of plenty of scientific simulations ranging from computational fiuid dynamics to sparse 
numerical optimization. 

The matrix A may be ill-conditioned and some preconditioning is often necessary during an iterative proce- 
dure. A possible preconditioned linear system is a transformation of the linear system ([T]) to B~^Ax — B~^b 
where B is an approximation to A. The basic linear iteration for solving the preconditioned linear system 
([T]) above is given as follows 



B-^ib-Ax""). 



The iteration is stopped as soon as the residual b — Ax"^ is small enough under suitable norm. However, 
the fixed point iteration shown above is very slow, and it is usually replaced by a more sophisticated and 
relatively robust Krylov subspace based methods |12j where the solution is improved with the help of Krylov 
space { 6, B~^Ab, . . . , {B~^Ayb }. When the preconditioned operator B~^A is symmetric positive definite, 
a popular choice is the conjugate gradient method [12]. The convergence of the conjugate gradient method 
depends on the condition number of B^^A which in this case is simply the ratio of its largest and smallest 
eigenvalue. 



^Some part of this work was done when the author was supported by Fends de la recherche scientifique (FNRS)(Ref: 2011/V 
6/5/004-IB/CS-15) and the renewed contract (Ref: 2011/V 6/5/004-IB/CS-9980) at Universite Libre de Brussels, Belgique and 
the remaining work was done at KU Leuven, Leuven, Belgium 
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Most of the properties such as eigenvahie bounds and the condition number of the classical preconditioners 
including Jacobi, Gauss-Siedel, SSOR, alternating direction method (ADI), sparse incomplete LU with no 
fill, i.e., ILU(O), and modified ILU (MILU) which were earlier obtained by difficult analysis, were obtained 
easily and elegantly via Fourier analysis by Chan and Elman in [^. For a two dimensional model problem, 
Fourier analysis was used to determine the condition number of block MILU for a hyperbolic model problem 
[To] , Using a similar approach, the condition number of ILU(O) and MILU for a three dimensional anisotropic 
model problem is derived in [4]. On the other hand, for two-grid and multigrid methods, Fourier analysis 
has been used to understand the action of the smoother in damping the error [14j . Recently, the author used 
similar analysis to determine the condition number of a filtering preconditioner [5]. 

A preconditioner known as RNF(0,0) was first introduced in [7]. It is a modified form of the nested factoriza- 
tion method introduced in [1]. For simplicity and its resemblance to SSOR method, we shall call this method 
Hierarchical SSOR (HSSOR) . We call it hierarchical because the preconditioner is built hierarchically using 
the SSOR preconditioner on lower dimensions thereby using the structure of the matrix. Like point-wise 
SSOR method, the HSSOR method has no storage requirements and has no construction cost. On the other 
hand, unlike block SSOR method where explicit factorization of 2D blocks are required, for HSSOR, no ex- 
plicit factorization of any lower dimensional block is ever performed. In fact, the 2D plane blocks themselves 
have SSOR factors. The convergence of the method is faster compared to ILU(O) as shown in this paper. 
Our empirical results suggests that HSSOR is a better replacement for SSOR (or Gauss-Siedel) smoother 
which is widely employed in the two-grid schemes. For symmetric positive definite coefficient matrix, it was 
proved in [7] that the HSSOR method (called RNF(0,0) in [7]) is convergent. In this paper, we derive the 
precise expression for eigenvalue and the condition number of the HSSOR preconditioned matrix for the 
three-dimensional isotropic model problem. Since our analysis uses the same model problem used in [4] , our 
condition number estimate can be compared to that of ILU(O). 

The rest of this paper is organized as follows. In section 2, we introduce some notations, the model problem 
that we use, and finally we discuss the HSSOR method. In section 3, the Fourier eigenvalues and eigenvectors 
are derived, and several properties including the condition number estimate is presented. The numerical 
experiments and comparisons are presented in section 4. Finally, section 5 concludes the paper. 



2 Model problem and the preconditioner 

The model problem is the following three-dimensional anisotropic equation: 

~{liUxx + l2Uyy + hu:,z)=r (2) 

defined on a unit cube = { < x, y, z < 1 }, with /i, ^2, /a > 0, and with the periodic boundary conditions 
as follows 

u(a;, y, 0) = w(a;, y, 1), m(x, 0, z) = m(x, 1, z), and u(0, y, z) = 7i(l, y, z). (3) 

The discretization scheme considered in the interior of the domain is the second order finite differences on 
a uniform n x n x n grid, with mesh size h — l/{n + 1) along x, y, and z directions. Here we shall use 
the notation h to denote the mesh size for the periodic case. With this discretization, we get a system of 
equation 

Au = b. (4) 

It is useful to express the matrix A arising from the periodic boundary conditions using the notation of 
circulant matrices and the Kronecker products. We introduce these notations as follows. 

Definition 1. Let C be a matrix of size pqxpq. We call C a block circulant matrix if it has the following 
form 
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C = BcirCp{Co, Cp-i, • ■ • , C2, Ci) 



^ Co Cp_i ■ ■ • C2 Ci ^ 
Ci Co Cp-i '■ : 
: Ci Co ■ • : 



Cp_2 
\ Cp-l Cp-2 



Cp-i 
Ci Co / 



where each of the blocks Ci are matrices of size q x q each. We observe that a block circulant matrix is 
completely specified by a block row. However ii q — 1, then we call it circulant matrix and denote it by 
cirCp{Co,Cp-i, - ■ ■ ,C2,Ci). 

Notation 1. Further, for block circulant tridiagonal matrices we introduce the following notation 

( Co Ci C2 \ 

C2 ■ • • ■ • • 



Bctridp{C2,Co,Ci) = 



Co Ci 
C2 Co / 



where each of the blocks Ci are matrices of size q x q each. However ii q = 1, then we denote it by 
ctridp{C2,CQ,Ci). 

Notation 2. For block tridiagonal matrix with constant block bands we introduce the following notation 

/ Fo Fi \ 



Btridp{F2,Fo,Fi) 



F2 



Fo Fi 
F2 Fo I 



where each of the blocks Fi are matrices of size qx q each. If (7 = 1, then we denote it by tridp{F2, Fo,Fi). 

Definition 2. The Kronecker product (g) is an operation on two matrices of arbitrary size resulting in a 
block matrix. Let A = {ai_j) and B = (bij), then by A® B we mean 



( a\\B a\2B ... ainB \ 



A®B = 



\ a„iB an2B 



B J 



If the difference operators are scaled by step size /i^, then equation of (|4]) corresponding to the (i, j, k)*-^ grid 
point is the following: 



^ij^k'^i.jjk ~t~ ^i,j,/cWz+l,j,fc ~\~ ^i^j.k'^l.j + l.k di j }^V>i—\ jj^ 

~t~ ^ij^k'^ij — l,k fi.j,k'^i^j,k-\'l ~t~ Qi^j ^k^i^j .k — 1 — ^i^j^k-, 

where i < i, k < n, and 

bi,j,k =0, i = n, dj^k =0, j = n, 
fij,k =0, k = n, di,j^k = 0, i = 1, 
ei,j,k =0, j = 1, gij^k = 0, = 1. 



(5) 



(6) 
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For an anisotropic model problem, we have aij,k = 2(^i + I2 + h), ^ij'.fe = — ^i, '^i.j,k = —hi di.j,k = — ^1, 
Gij.k = —I2, fij.k = — ^3, 9i,j,k = —^3, wherB Wij.k — h^r{i, j, k). Here the subscript (i, j, k) corresponds to 
the grid location (ih, jh, kh). Let Ik denote the identity matrix of size k x k. Using the notation of circulant 
matrix and Kronecker product, the coefficient matrix corresponding to formula (O is expressed as follows 

A = Bctridn{-hln^, 0,-1^1^2), D = Bctrid^ {-hin, D, -hin) , D = ctridn {-li,d,-li) . 

We consider now the same problem (1211) with the following Dirichlet boundary condition 

u{x,y,0) = 0, m(x,0, z)=0, u{0,y,z) — 0. (7) 

To differentiate the Dirichlet problem with that of periodic problem, we shall use bold face letters to denote 
the matrices corresponding to the Dirichlet case. Using second order finite differences with the Dirichlet 
boundary conditions ([T]) above, we obtain the matrix A corresponding to the Dirichlet case as follows 

A = D + Li + + L2 + + L3 + LJ, 

where 

L3 = _Btriff„(-?3/„2,0,0), L2 = /„ «) Btridn{-l2ln,0,0), Li = In^ ^ trid„(-Zi, 0, 0). 
For the above model problem the HSSOR preconditioner B for the Dirichlet problem is defined as follows: 

B = (P + L3)(I + P-^LJ), 

P = (T + L2)(l + T-iLj), (8) 
T^(M + Li)(l + M-iL^^), 

where M = diag{A). The HSSOR preconditioner defined above is named RNF(0,0) preconditioner in [7]. 
Using the notation of circulant matrix and the Kronecker product, the HSSOR preconditioner for the periodic 
boundary condition is now defined as follows 



B 


= {P + L3){I + P-'L^), Pis of 


3 3 

Size n X n 


L3 


= Bdrc„(0,--- ,0,-/3/„2), 




^3 


= Sdrc„(0,-/3/„2,0,--- ,0), 




P 


= In Po, Po is of size x ti^ , 




Po 


= {f + L2){I + T-^Ll), 




L2 


= Sdrc„(0,--- ,0,-/2-^n)> 




jT 
^2 


= BcirCniO, ~l2ln,0, ■ ■■ ,0), 




f 


= /„ ® To, To is of size n x n, 




To 


= circn{m + l\/m, —li, 0, • • • , 0, — 




m 


= d. 





It can be proved that i? is a circulant matrix. 



3 Fourier analysis of HSSOR 

In this section, we derive the Fourier eigenvalues of the HSSOR preconditioned matrix. For clarity and 
simplicity, we restrict our analysis to the isotropic problem {h = I2 = I3 — however, similar analysis 
holds for the general anisotropic case. In the following, we outline certain assumptions on which our analysis 
will be based. These assumptions are similar to those made in [5], and has been justified their appropriately. 
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Firstly, our analysis is for the HSSOR preconditioner B and the coefficient matrix A corresponding to the 
periodic boundary conditions. Secondly, Fourier analysis is an exact analysis only for constant coefficient 
matrix, which is indeed the case when we have an isotropic model. According to an argument in 2 , the 
extreme eigenvalues for the periodic and the corresponding Dirichlet problems are same provided n = 2nd- 
Here rid + I = l/hd and hd is the mesh size for the Dirichlet problem. 

Eigenvectors of A are found by applying the operator A to eigenvectors v"'*'^. The fc)"* grid component 
of eigenvector u"'*''" is given by 

= e"^»e'^■**e''=«^ (10) 

where l = V-T, = -^s, (pt = and £,r = -^"^^ for r, s, t = 1, • • • , n. The eigenvalues Xs^tA^) of 

the matrix A is determined by substituting ([TU]) for Wij.^ in the left hand side of ([S]), and it is found to be 

A,,t,,(A) = 4 l^isin'^ + /2 + hsin"^^ . (11) 

For circulant matrices following results hold. 

Lemma 1 ([3]). Any circulant matrix of size n share the same set of eigenvectors. 
Using lemma ([T]) above, we have the following result. 

Lemma 2. Let S and R be two given circulant matrices with eigenvalues Xs^t,r{S) and Xs,t^r{R) respectively. 
Then the eigenvalues of S + R and SR corresponding to the (s, t, r)*^ grid point is given as follows: 

1- Xs.t,r{S + R) = Xs,t.,r{S) + Xs,t,r{R)- 
^- Xs^t,r{SR) = Xs^t,r{S)Xs^t,riR) ■ 

Proof. It follows easily using lemma ([1]) above. □ 

Using the lemma [2] above, the eigenvalues Xs,t.r{B~^A) of HSSOR preconditioned matrix is then given by 

K,tAB-'A) = (12) 
where Xs.t,riB) is given hierarchically as follows: 

Xs,t,r {L^) \ 



X,.t,r [B) - iXs,tAP) + KtALs)) (^1 + 

Xs,t,r (P) = {Xs,tAT) + KtAP2)) (i + 



^sAr (P) 



KtAT) 

As,t,r (T) = Xs.t,r (M) + Xs.t,r (^l) + Xs,t.r (if) , 

Xs,t,r (Li) = -he''^", As,t,r (Li) = -lie^'^", 
Xs,t,r {L2) - -he'^^ Xs,t,r {Ll) = ~he-'*\ 

Xs,t,r (Ls) = -^3e'«^ Xs,t.r (^J) = -he^'^^ . (13) 

where Xs,t,r (M) = 6. The eigenvalues for the matrices Li, L2, L3,Ui,U2, U3, and M were found by inspection, 
for instance, if ([5]) denotes the stencil for the original matrix A, then the stencils (or equations) for the 
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Tni^tripps Tn Tj'~> Tji T/F T/F TjF r\T\(\ A/f arp P'ivPTi V)v 




stencil for M = 




stencil for Li — 


—hui^ij.k 


stencil for LJ = 


— llUi^ljk 


stencil for L2 — 


—huij-i^k 


stencil for = 




stencil for = 


^h'U'i.j,k-l 


stencil for L3 = 


—huijM+i 



We recall here that the matrices M, Li, L2, is, Lf , L2, and are all circulant matrices of same size as the 
original coefficient matrix A, thus they share the same set of eigenvectors given by ^TU\\ . After substituting 
the eigenvector pH]) in for u^ j k, a straightforward computation gives the required eigenvalues in (|13p . 
The Fourier eigenvalues for the HSSOR preconditioner B for the isotropic case, 11 — 12 — 13 — 1, is derived 
as follows 

where \s.t,r{P) is derived in a similar way as follows 

\s,tAP) = {KtAT) + KtAL^)) (1 + * 



As,t,r (7") y 



= Xs.tAT) - e'^* - e-^^* + ' ' = A,,i,.(T) + - 2cos{4>t), 

where Xs.t,r{T) = 6 — 2cos{6s). Due to the periodicity of eigenvalues, i.e., 

As,t,r(-B) \{ds,ipt,ir) — ^s,t,r{B) |(2ir-es,2Tr-0t,2ir-5^)i 

we will restrict our domain to (0, tt) instead of (0, 27r). For any arbitrary matrix K , we will use the notation 
Xmin{K) and Ainax(-f^) to dcnotc the minimum and maximum eigenvalues respectively. When the expression 
As,t,r(^) does not depend on one or more of its arguments s, t, or r, in such case, we use the dummy 
argument For instance, As^»^*(-fir) is an expression independent of the arguments t and r. 

Lemma 3. IfOs, 4>t, and S^r Us in the interval (0, tt), then following holds 

1. \rmn{A) = Aia,i(A) > 0, 

(T) = Ai^*^*(r) > 4, 
(P) = Ai,i,,(P)>9/4, 
(B) = Ai, 14(B) > 95/36, 

5- Xmax{T) — \n/2,*AP) 8, 

6. A 

max (P) = A„/2,„/2,*(i') < 81/8, 

7. A 

max (B) = A„/2,„/2,„/2(S) < 7921/648, 
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8. \max{A) = XlxM) < 12- 

Proof. Wc observe that KniniA) = 4(sm2(6'i/2) + sin'^{(l)i/2) + sin^(Ci/2)) = Ai,i4(^) > 0. To prove other 
parts of the lemma, we have \min{T) = 6 — 2cos{9i) = Ai^*^*(T) > 4. Now given x > 1, the expression a; + i 
increases or decreases according as x increases or decreases, consequently, we have 

Amm(-P) = Ami„(T) + - — _ max(2cos(0t)), 

= Ai,*,*(T) + - — ^— — - max(2cos((?ii)), 

= KiAP) > 9/4. 

Similarly, we have 

Xmin {B) = Xmin {P) + , \ - max(2cos(^^)) = \i,i,i{B) > 95/36. 
On the other hand for the upper bounds, we have Xmax{T) = 6 — 2cos{6n/2) = A„/2,*,*(T) < 8, and 

Xmax{P) = Xmax{T) + - min(2cOs((?!)t)) = A„/2,n/2,*(-P) < 81/8. 

Similarly, we have 

Xmax (B) = Xmax{P) + T - min(2cOs(Cr)) = Xn/2,n/2,n/2{B) < 7921/648. 

Finally, it is clear that \max{A) = Xn/2,n/2,n/2{A) < 12. Hence, the lemma is proved. □ 

In the following Lemma, we determine the eigenvalues oi B — A. 
Lemma 4. The eigenvalues of B — A are given as follows 

Xs,tAB -A)= ^ + ^ 



Xs,t,r{T) Xs,t,r{P) 



Proof. We have 



X.^tAB) = Xs,tAP) + . \m - 2cos(Cr), 

= X,,tAT) + Y^-FF\ + ^ - 2(cos(,^i) + cos(er)), 
= 6 - 2(cos(6lt) + cos(0r) + cos(Cr)) + - — 3_. + ^ 



Xs,t,riT) Xs,t,r{P) 

= A {A)+ ^ + ^ 

Xs,t,r{T) Xs,t,r{P) 

The proof is complete. □ 

Following lemma will be useful in estimating condition number of HSSOR. 
Lemma 5. There holds 

X^ax{B - A) ^ Xi,is{B~^ A), 

Xmin{B - A) = Xn/2,n/2,n/2{B^ A). 
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Proof. Using Lemma |H we have 

X,,tAB -A) = Xs.tAB) - KtAA) = , + , \p. , (15) 

and using Lemma [3] above, we have 

{T) Xmin{P) 

Amm(S - A) = — - + — = — — - + — ■ -— = A„/2,„/2,„/2(S ^ A) . 



□ 



Lemma 6. There holds 



[B A) = Xn/2,n/2,n/2{B A). 



Proof. We have 

As,t,r(^) 



Xs,tAB) l + X-UA){Xs.tAB-A)) 



From Lemma [3] above, we have A., ] r{A) > and Xs.t,riB — A) > 0, thus, using Lemma[5]we can write the 
foUowing 

^-"'^"^'^ l.A,UA)(L(i.-^)) ^^'"'^"-^'' '''' 

X,nax{B'^A) = — — = A„/2,„/2,„/2(S"^A). (17) 

t + A,nax[A)[Arnin[-D ^ A)) 

The proof is complete. □ 

The matrix B^^A is symmetric by construction, the following corollary proves that it is also positive definite. 
Corollary 1. The HSSOR preconditioned matrix B^^A is symmetric positive definite. 

Proof. Using Lemma IHl and Lemma [3] above, we have 

Xmin{B ^A) — Xn/2^n/2,n/2{B~^A) = Xs^t,r{A) / Xs^t,r{B) > 0. 

The proof is complete. □ 

The following theorem gives a condition number estimate of HSSOR preconditioned matrix. Let cond(K) 
for any matrix K denote the condition number of K. The notation w will mean 'approximately equal to'. 

Theorem 1. Let h tends to zero, then the condition number of the HSSOR preconditioned matrix B^^A is 
given as follows 



cond{B-^A) 



25(5 + 57r2 + TT^ 



144(37r2(5 + 57r2 + tt*) + Ai:^) 
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Proof. From LemmalSlabove, the condition number of HSSOR preconditioned matrix, cond(i? ^A), is given 
as follows 

Using Lemma [3] and recalling that Og = 27rs/(n + 1) = {2TTh)s, We have cos{di) — cos{2TTh) « 1 — 2TT^h'^, 
consequently, we have Ai^*_*(T) = 6 — 2cos{9i) ~ 4(1 +Tr^h'^), thus using Lemma|3]and cos(0i) = 1 — 2'n'^h?, 
we have 

A™a.(B-A) = Ai4,i(B-A), 

1 1 



Ai,*,*(7^) K*AT) + Ar.l,*(T) - 2cos(0i) ' 
1 1 



4(l + 7r2/i2) 4(l + 7r2;i2) + (i/4)(i + 7,2/j2)-i_2(l-27r2;i2)' 

1 1 



4(l + 7r2/i2) 4(l + 7r2;i2) + (i/4)(i_7r2/i2)_2(l-27r2/i2)' 

1 1 



4(l + 7r2/i2) 2+ 1/4+ (8- l/4)7r2/i2' 

1 4 25 + 477r2/i2 



4(l + 7r2/i2) 9 + 3l7r2/i2 4(1 + 7r2/i2)(9 + 3i7r2/i2) ' 
we have \min{A) = Ai, 14(A) = 4:{sin'^{9i/2 + 0i/2 + £,i/2)) « 12TT^h^. From above estimates and from 



expression (|16p we have 



1 



l + A„k(A)(A™..(i3-A))' 

1 



1 + (l/127r2/i2)(25 + 477r2/i2)/(4(l + 7r2/i2)(9 + 3ln^h^)) ' 

127r2fc2 

~ 127r2/i2 4. (25 + 477r2/i2)/(4(l + 7r2/i2)(9 + 3in^h^)) ' 

_ 127r2/i2 . 4(1 + 7r2fe2)(9 + 317^2^2) 

^ 4 • 127r2/i2(i + 7r2/^2)(9 _^ 31,^2/12) + 25 + 477r2/i2 ' 

_ 4 • 9 • 127r2/i2 + 0(/i4) ^4.9. I27r2/i2 

^ 25 + 0(/i2) 25 ' 

We recall that 0s = 2ns/ {n + 1). For s — n/2, we have 9^/2 — rni/{n + 1) = vr — 7r/(n + 1) = (1 — /i)7r (since, 
l/(n + l) = /i). Wehavecos(6l„/2) « l-(6'2^2)/2 = 1-(1-^)^'^V2 « ((2-7r2) + 27r2/i)/2. Due to symmetry, 
we have cos(0„/2) = cos(^„/2) = cos(0„/2). Consequently, A„/2,*,*(T) = 6 — 2cos{9n/2) — A + — 2n^h. 
Using Lemma SI we have 

Xmin{B - A) ^ A„/2,n/2,n/2(-B - A), 

1 1 



A„/2, *,*(?') A„/2,*,*(T) + A„/2,*,*(^) ^ 2cos((/)„/2) ' 

1 1 



4 + 7r2-27r2/i (4 + 7r2 - 27r2;j) + (4 + 7r2 - 27r2/i)-i - ((2 - 7r2) + 27r2/i)/2' 
1 1 



4 + 7r2 - 27r2/i (4 + 7r2 - 27r2/i) + (4 + 7r2)-i(l + 2772/1/(4 + 7r2)) - ((2 - 7r2) + 2iT'^h)/2' 
1 1 



4 + 7r2-27r2/i 4 + 7r2 + 1/(4 + 7r2) - (2 - 7r2)/2 + 0(/i) ^ 

2(4 + 7r2) + C'(fe) 4 + 7r2 

(4 + 7r2) + 2 - (2 - 7r2)(4 + tx^) + 0(/i) ~ 5 + 57r2 + tt* ' 
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From Lemmaini we have XmaxiA) = A„/2,n/2,n/2(^) — 6(1 — cos0„/2) ~ 6 — 3(2 — tt^ + 2n'^h) — 37r^(l — 2ft,). 
Thus, we have 

Xmax{B ^A)^ - — I — 



1 + A™L(A)(A„„,(B- A))- 
1 



1 + (l/(37r2(l - 2ft))((4 + 7r2)/(5 + + tt*)) ' 

37r2(l-2/i)(5 + 57r2+7r4) 
37r2(l - 2/i)(5 + 57r2 + tt^) + 4 + 7r2 ' 

37r2(5 + 57r2 + 7r4) + 0(ft) 



37r2(5 + 57r2 + tt^) + 4 + 7r2 + 0(ft) ' 
The condition number oi B^^A is given as follows 

37r2(5 + 57r2+7r4) + 0(/i) 25 



cond(B-M) = 

o 

( 144(37r2(5 + 57r2 + tt^) + 47r2) 
The proof is complete. □ 



37r2(5 + 57r2 + tt*) + 4 + 7r2 + 0{h) 4 • 9 • Un^h^ ' 



4 Two grid scheme 

In classical AMG, a set of coarse grid unknowns is selected and the matrix entries are used to build inter- 
polation rules that define the prolongation matrix P, and the coarse grid matrix Ac is computed from the 
following Galerkin formula 

Ac = P'^AP. (18) 

In contrast to the classical AMG approach, in aggregation based multigrid, first a set of aggregates G,; are 
defined. Let Nc be the number of such aggregates, then the interpolation matrix P is defined as follows 




1, ifieGj, 
0, otherwise. 



Here, l<i<iV, l<j< Nc, N being the size of the original coefficient matrix A. Further, we assume that 
the aggregates Gi are such that 

G, n Gj = </>, for i ^ j and U, G, = [1, N] (19) 

Here [1, N] denotes the set of integers from 1 to N. Notice that the matrix P defined above is an x Nc 
matrix, but since it has only one non-zero entry (which are "one") per row, the matrix can be defined by a 
single array containing the indices of the non-zero entries. The coarse grid matrix Ac may be computed as 
follows 

{Ac)^j ^ ^ ^aki 
k£Gi leGj 

where 1 < i, j < Nc, and a^i is the (k, l)th entry of A. 

Numerous aggregation schemes have been proposed in the literature, but in this paper we consider the 
aggregation scheme based on graph matching as follows 

Aggregation based on graph matching: Several graph partitioning methods exists, notably, in software 
form 5 . Aggregation for AMG is created by calling a graph partitioner with required number of aggre- 
gates as an input. The subgraph being partitioned are the aggregates Gi. For instance, in this paper we 
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use this approach by cahing the METIS graph partitioning routine, namely, METIS_PartGraphKway 
with the graph of the matrix and number of partitions as input parameters. The partitioning informa- 
tion is obtained in the output argument "part". The part array maps a given node to its partition, i.e., 
part(i) — j means that the node i is mapped to the jth partition. In fact, the part array essentiahy 
determines the interpolation operator P. For instance, we observe that the "part" array is a discrete 
many to one map. Thus, the ith aggregate Gi = part~^(i), where 

part-i(i) = {je [1, N] I partO') = z } 

Such graph matching techniques for AMG coarsening were also explored in [6l [11]. For notational 
convenience, the method introduced in this paper is called GMG (Graph matching based aggregation 
MultiGrid). 

Let S denote the matrix which acts as a smoother in GMG method. The usual choice of S* is a Gauss-Siedel 
or SSOR preconditioner jl2j . However, in this paper we choose HSSOR as a smoother and compare it with 
SSOR. 

Let M = PAcP'^ denote the coarse grid operator interpolated to fine grid, then the two-grid preconditioner 
without post-smoothing is defined as follows 

B = {S-^ + M-'^ - M-'^AS-^y'^. (20) 

We notice that M^^ w PA^^P^, thus, an equation of the form Mx = y is solved by first restricting y 
to yc — P^y, then solving with the coarse matrix Ac the following linear system: AcXc — yc- Finally, 
prolongating the coarse grid solution Xc to x = Pxc- Following diagram illustrates the two-grid hierarchy. 



Restrict y to yc :— P y 



Prolongate Xc to x :— Px^ 



Solvc:Ac2:c — yc 



5 Numerical experiments 



All the numerical experiments were performed in MATLAB with double precision accuracy on Intel core 
i7 (720QM) with 6 GB RAM. The AMG method introduced in this paper, namely, GMG, is written in 
MATLAB. For GMG, the iterative accelerator used is GMRES available at ^13^, the code was changed such 
that the stopping is based on the decrease of the 2-norm of the relative residual. For both GMRES, the 
maximum number of iterations allowed is 500, and the inner subspace dimension is 30. The stopping criteria 
is the decrease of the relative residual below 10"^°, i.e., when 

m 

Here b is the right hand side and Xk is an approximation to the solution at the fcth step. 



5.1 Test cases 

Diffusion: Our primary test case is the following diffusion Equation. We use the notation DC to indicate 
that the problems are discontinuous. We consider a test case as follows 

— div(K(a;)Vw) — / in i7, 

u = Oondno, (21) 

^" o oo 

— — — on aSZjv, 

on 
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Table 1; Notations used in tables of numerical experiments 



INFnffl f inn q 


A/Tpa ni n Cf 

J. V J. CL 11 11 i- tl 


h 


Discretization step 


N 


Size of the original matrix 


Nr 


Size of the coarse grid matrix 


its 


Iteration count 


time 


Total CPU time (setup plus solve) in seconds 


cf 




ME 


Memory allocation problem 


NA 


Not applicable 


NC 


Not converged within 500 iterations 


SSOR 


Symmetric successive over-relaxation method {uj — 1)|12) 


BSSOR 


Block symmetric successive over-relaxation method (cj = 1)|12] 


HSSOR 


Hierarchical SSOR 


GMG-HS 


Graph based matching for AMG, smoother HSSOR 


GMG-SS 


Graph based matching for AMG, smoother SSOR 



DCl, 2D case: The tensor k is isotropic and discontinuous. The domain contains many zones of 
high permeability that are isolated from each other. Let [x\ denote the integer value of x. For 
two-dimensional case, we define k{x) as follows: 

( \ - { ([1^^ * ^2] + 1), if [10 * x,] = {mod 2), i = 1, 2, 

^ ' \ 1; otherwise. 

The velocity field a is kept zero. We consider a n x n uniform grid where n is the number of 
discrete points along each spatial directions. 

DCl, 3D case: For three-dimensional case, k(x) is defined as follows: 

, , f 103*([10*X2] + 1), if [10*a;,] = (mod 2) , i = 1,2,3, 
^ ^ \ I7 otherwise. 

Here again, the velocity field a is kept zero. We consider a n x n x n uniform grid. 

Isotropic problem: i.e., we have Zl = /2 = /3 = 1 and d = 6 for the model problem ([2T|) . More 
precisely, the matrix is given as follows 

A = Btridn {^-hl^^ ,D , -hl^^^ , D = Btridn {-hin, D, -l2ln) , D = tridn {-li,d,-li) . 

For the 2D case, we have 11 = 12 = 1, and d = 4, the matrix after discretization is given as follows 

A = Btridn {-hin, D, -l2ln) , D = tridn i-h,d,-li) . 

Comments on numerical experiments 

In Table El we compare the two-grid methods, ILU(O), HSSOR, SSOR, and BSSOR. In two grid methods, 
GMG-SS has point SSOR smoother while GMG-HS has HSSOR as a smoother. For the 2D case and for 
l/h> 400, the ILU(O), SSOR, HSSOR, and BSSOR does not converge within 500 iterations, while aU of 
them converge for 3D problem except for large size problem where insufficient memory error occurs. In 
particular, for block SSOR method, we need to store the LU factors corresponding to the 2D blocks. This 
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Table 2: Numerical results for isotropic model problem with cf — 4.5 using GMRES(30), maximum number 
of iterations allowed is 500 



matrix 1/h GMG-HS GMG-SS ILU(O) HSSOR SSOR BSSOR 







its 


time 


its 


time 


its 


time 


its 


time 


its 


time 


its 


time 




400 


39 


6.2 


46 


6.8 


NC 


NA 


NC 


NA 


NC 


NA 


NC 


NA 


2D 


800 


39 


28.9 


47 


37.1 


NC 


NA 


NC 


NA 


NC 


NA 


NC 


NA 




1000 


42 


47.5 


50 


60.9 


NC 


NA 


NC 


NA 


NC 


NA 


NC 


NA 




40 


23 


1.3 


21 


3.7 


55 


1.8 


42 


1.6 


68 


2.7 


127 


83.4 


3D 


80 


25 


24.5 


22 


138.7 


129 


61.9 


89 


38.7 


157 


76.6 


ME 


NA 




100 


26 


72.3 


ME 


NA 


147 


138.3 


113 


106.4 


185 


180.7 


ME 


NA 



Table 3: Numerical results for DCl problem with cf = 3 using GMRES(30). Maximum number of iterations 
allowed is 500 



matrix 1/h GMG-HS GMG-SS ILU(O) HSSOR SSOR BSSOR 







its 


time 


its 


time 


its 


time 


its 


time 


its 


time 


its 


time 




400 


29 


5.7 


35 


10.5 


NC 


NA 


NC 


NA 


NC 


NA 


NC 


NA 


2D 


800 


29 


33.4 


34 


49.5 


NC 


NA 


NC 


NA 


NC 


NA 


NC 


NA 




1000 


29 


58.4 


35 


85.0 


NC 


NA 


NC 


NA 


NC 


NA 


NC 


NA 




40 


247 


18.5 


300 


44.0 


475 


14.2 


NC 


NA 


NC 


NA 


NC 


NA 


3D 


80 


237 


337.5 


281 


895.9 


NC 


NA 


NC 


NA 


NC 


NA 


NC 


NA 




100 


ME 


NA 


ME 


NA 


NC 


NA 


NC 


NA 


NC 


NA 


NC 


NA 



is the reason why we refrain from using BSSOR as a smoother in the two-grid scheme. The iteration count 
as well as CPU time for HSSOR is smaller compared to ILU(O) and the difference between iteration count 
and CPU time increases with the size of the problem. 

In Table [2l as expected, the two-grid methods show mesh independent convergence, and the CPU time and 
iteration count is much less than that of the stand alone methods. Comparing the two-grid versions GMG-SS 
and GMG-HS, we find that GMG-HS is an improvement over GMG-SS, particularly, for the 2D isotropic 
problem. 

In Table [31 we show experimental results for a discontinuous DCl problem both for 2D and 3D problem. 
This problem is difficult compared to isotropic case, we had to keep a smaller cf value of 3. For cf = 4.5, 
the two-grid methods did not converge within 500. Notably, neither of the stand-alone methods converged 
within 500 iterations. However, for cf = 3, the two-grid method shows mesh independent convergence with 
GMG-HS taking relatively less iterations, and takes less CPU time compared to GMG-SS. 
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6 Conclusion 



In this paper, we have obtained a condition number estimate for a hierarchical SSOR method. The estimate 
facihtates comparison with the condition number of ILU(O) obtained in Numerical experiments shows 
that the HSSOR is faster compared to ILU(O), SSOR, and BSSOR as a stand alone preconditioner. Moreover, 
for a two-grid method, we show that HSSOR is an efficient smoother and thus could replace the widely used 
Gauss-Siedel or SSOR smoother. 
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